-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix edge cases for already exists resources #5239
Fix edge cases for already exists resources #5239
Conversation
e08fa50
to
d71f726
Compare
Codecov Report
@@ Coverage Diff @@
## main #5239 +/- ##
==========================================
- Coverage 41.43% 41.39% -0.04%
==========================================
Files 231 231
Lines 19666 19679 +13
==========================================
- Hits 8149 8147 -2
- Misses 10923 10936 +13
- Partials 594 596 +2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
I have an alternative approach to this PR at openshift#188 fyi which put most of the logic behind isAlreadyExistsError function instead. |
pkg/restore/restore.go
Outdated
if fromCluster == nil { | ||
fromCluster, err = resourceClient.Get(name, metav1.GetOptions{}) | ||
if err != nil { | ||
ctx.log.Infof("Error retrieving cluster version of %s: %v", kube.NamespaceAndName(obj), err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is copied from existing code, but seems it's more consistent (e.g with the code in line 1247 ) if we merge this err
into errors
.
We may also wanna change the log level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reasonerjt Hmm. I'm not sure, but yeah, you may be right. So this would be a change in behavior for situations not otherwise modified by this PR. So the scenario we're talking about is that the Create
failed due to the object already existing, but now when we do a Get
call (to determine whether the in-cluster object is different from the from-backup object), the Get call fails. Ordinarily this shouldn't happen. Off the top of my head, possible causes are:
- Something outside of Velero just created this resource at almost the same time as Velero tried to restore and we're in a race scenario where a second Create fails due to an already existing object but the API server isn't yet finding it via Get.
- A bug or malfunction in the APIServer
We're currently flagging this as a restore warning. If we want to stick with that, then we should probably change the log level from Info to Warn.
The other possibility is we bump this up to a restore error and the log level to Error as well.
@shubham-pampattiwar what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reasonerjt @sseago I agree with changing the log level to error and adding this error to errors (as the next piece of workflow depends on the object fetched from the cluster)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, updated the PR.
Signed-off-by: Shubham Pampattiwar <[email protected]> add changelog file Signed-off-by: Shubham Pampattiwar <[email protected]> update changelog filename Signed-off-by: Shubham Pampattiwar <[email protected]> change log level and error type Signed-off-by: Shubham Pampattiwar <[email protected]>
d71f726
to
93a8758
Compare
Signed-off-by: Shubham Pampattiwar [email protected]
Thank you for contributing to Velero!
Does your change fix a particular issue?
Fixes #5223
Please indicate you've done the following:
/kind changelog-not-required
as a comment on this pull request.site/content/docs/main
.